home *** CD-ROM | disk | FTP | other *** search
Text File | 1988-08-16 | 42.3 KB | 1,537 lines |
- -h- readme.txt Wed Jul 24 11:49:28 1985 USER$A:[MINOW.LZ]README.TXT;1
- This is a rewrite of the Unix compress utility. It is *not*
- switch-compatible with Unix compress, however it is (almost)
- file-compatible (when compiled on Unix, or when "export" mode
- is selected on VMS Version 4).
-
- The advantages of this version are as follows:
-
- 1. Compress and decompress are separate programs, simplifying the
- problems of the small system implementor. Both run on an
- unmapped PDP-11 (with a maximum of 12 bits).
-
- The command interface is just
-
- lzcomp input compressed_output
- lzdcmp compressed_input output
-
- Input files are not deleted.
-
- 2. The compression algorithm and I/O design is intended to simplify
- embedding the programs (as subroutines) in some other task.
- (for example, in a database package.)
-
- 3. On non-Unix systems, the I/O design should be significantly
- faster. It should be slightly faster on Unix.
-
- The only disadvantage is that, as noted, it is not command (option)
- compatible with Unix compress. Also, some periferal functionality
- (such as the deletion of input files and the output file naming
- conventions) have not been implemented.
-
- On Unix (i.e., in "export" mode), the compressed data file is
- identical to the Unix file, *except* that lzcomp writes two
- CLEAR codes in a row to signal end-of-file (and lzdcmp treats
- two CLEAR codes in a row as signalling end-of-file).
-
- lzcomp and lzdcmp have been added to the Decus C distribution.
-
- Martin Minow
- decvax!minow,
- minow%rex.dec@decwrl.arpa
-
- -h- lzcmp1.c Wed Jul 24 11:49:28 1985 USER$A:[MINOW.LZ]LZCMP1.C;21
- /*
- * lzcomp [-options] infile outfile
- */
- #ifdef DOCUMENTATION
-
- title lzcomp File Compression
- index File compression
-
- synopsis
- .s.nf
- lzcomp [-options] [infile [outfile]]
- .s.f
- description
-
- lzcomp implements the Lempel-Ziv file compression algorithm.
- (Files compressed by lzcomp are uncompressed by lzdcmp.)
- It operates by finding common substrings and replaces them
- with a variable-size code. This is deterministic, and
- can be done with a single pass over the file. Thus,
- the decompression procedure needs no input table, but
- can track the way the table was built.
-
- Options may be given in either case.
- .lm +8
- .p -8
- -B Input file is "binary", not "human readable text".
- This is necessary on Dec operating systems, such as VMS and
- RSX-11M, that treat these files differently. (Note that binary
- support is rudamentary and probably insufficient as yet.)
- (On VMS version 4, this is ignored unless the -x option is
- specified or the input file is record-oriented.)
- .p -8
- -M bits Write using the specified number of bits in the
- code -- necessary for big machines making files for little
- machines. For example, if compressing a file on VMS
- which is to be read on a PDP-11, you should select -M 12.
- .p -8
- -V [n] Verbose if specified. If a value is specified,
- it will enable debugging code (if compiled in).
- .p -8
- -X [n] "Export" -- write a file format that can be read by
- other operating systems. Only the bytes in the file are copied;
- file attributes are not preserved. If specified, the value
- determines the level of compatiblity. If not specified,
- or specified with an explicit value of zero, and lzcomp is
- running on Vax/VMS version 4 under VaxC and the input file
- is a disk or magtape file (block-oriented), a VMS-private output
- format is used which is incompatible with the Unix compress
- utility, but which preserves VMS file attributes. -X may
- take on the following values:
- .lm +4.s
- .i -4;#0##Choose VMS private format. See restrictions below.
- .i -4;#1##Compatible with Unix compress version 3.0:
- this is the default if -x is given without a value.
- .i -4;#2##As above, but supress "block compression"
- .i -4;#3##Supress block compression and do not output
- a compress header block. This is for compatiblity
- with a quite early version of Unix compress (and requires
- conditional-compilation to use).
- .lm -4.s
- Note that the -B (binary) option is ignored unless
- the input file is "record-oriented", such as a terminal
- or mailbox.
- .lm -8.s
- The other two arguments are the input and output
- filenames respectively. Redirection is supported,
- however, the output must be a disk/tape file.
-
- The file format is almost identical to the current
- Unix implementation of compress (V4.0). Files written
- by Unix compress should be readable by lzdcmp. Files
- written by lzcomp in export (-x) format will be
- readable by Unix compress (except that lzcomp outputs
- two "clear" codes to mark EOF. A patch to Unix
- compress is available.)
-
- VMS Restrictions
-
- VMS Private mode stores the true name and attributes
- of the input file into the compressed file and lzdcmp
- restores the attributes (and filename if requested).
- The following restrictions apply -- they may be lifted
- in the future as they are primarily due to the author's
- lack of understanding of the intricacies of of VMS I/O:
-
- All files must be stored on disk.
- The lzcomp output file must be specified directly.
-
- Also, for all usage on VMS, the compressed file must
- be written to, and read from disk.
-
- LZW compression algorithm
-
- This section is abstracted from Terry Welch's article
- referenced below. The algorithm builds a string
- translation table that maps substrings in the input
- into fixed-length codes. The compress algorithm may
- be described as follows:
-
- 1. Initialize table to contain single-character
- strings.
- 2. Read the first character. Set <w> (the prefix
- string) to that character.
- 3. (step): Read next input character, K.
- 4. If at end of file, output code(<w>); exit.
- 5. If <w>K is in the string table:
- Set <w> to <w>K; goto step 3.
- 6. Else <w>K is not in the string table.
- Output code(<w>);
- Put <w>K into the string table;
- Set <w> to K; Goto step 3.
-
- "At each execution of the basic step an acceptable input
- string <w> has been parsed off. The next character K is
- read and the extended string <w>K is tested to see if it
- exists in the string table. If it is there, then the
- extended string becomes the parsed string <w> and the
- step is repeated. If <w>K is not in the string table,
- then it is entered, the code for the successfully
- parsed string <w> is put out as comprssed data, the
- character K becomes the beginning of the next string,
- and the step is repeated."
-
- The decompression algorithm translates each received
- code into a prefix string and extension [suffix] character.
- The extension character is stored (in a push-down stack),
- and the prefix translated again, until the prefix is a
- single character, which completes decompression of this
- code. The entire code is then output by popping the
- stack.
-
- "An update to the string table is made for each code received
- (except the first one). When a code has been translated,
- its final character is used as the extension character,
- combined with the prior string, to add a new string to
- the string table. This new string is assigned a unique
- code value, which is the same code that the compressor
- assigned to that string. In this way, the decompressor
- incrementally reconstructs the same string table that
- the decompressor used.... Unfortunately ... [the algorithm]
- does not work for an abnormal case.
-
- The abnormal case occurs whenever an input character string
- contains the sequence K<w>K<w>K, where K<w> already
- appears in the compressor string table."
-
- The decompression algorithm, augmented to handle
- the abnormal case, is as follows:
-
- 1. Read first input code;
- Store in CODE and OLDcode;
- With CODE = code(K), output(K); FINchar = K;
- 2. Read next code to CODE; INcode = CODE;
- If at end of file, exit;
- 3. If CODE not in string table (special case) then
- Output(FINchar);
- CODE = OLDcode;
- INcode = code(OLDcode, FINchar);
-
- 4. If CODE == code(<w>K) then
- Push K onto the stack;
- CODE == code(<w>);
- Goto 4.
-
- 5. If CODE == code(K) then
- Output K;
- FINchar = K;
-
- 6. While stack not empty
- Output top of stack;
- Pop stack;
-
- 7. Put OLDcode,K into the string table.
- OLDcode = INcode;
- Goto 2.
-
- The algorithm as implemented here introduces two additional
- complications.
-
- The actual codes are transmitted using a variable-length
- encoding. The lowest-level routines increase the number
- of bits in the code when the largest possible code is
- transmitted.
-
- Periodically, the algorithm checks that compression is
- still increasing. If the ratio of input bytes to output
- bytes decreases, the entire process is reset. This can
- happen if the characteristics of the input file change.
-
- VMS Private File Structure
-
- In VMS Private mode, the compressed data file contains
- a variable-length (but compressed) file header with the
- file "attributes" needed by the operating system to
- construct the file. This allows the decompression
- program to recreate the file in its original format,
- which is essential if ISAM databases are compressed.
-
- The overall file format is as follows:
- .lm +8
- .p -8
- LZ_SOH "start of header" signal (this value cannot appear
- in user data).
-
- A variable-length data record (maximum 256 bytes)
- containing the header name, followed by whitespace, followed
- by header-specific information. In this case, the name
- record will contain the string "vms$attributes" followed
- by the number of bytes in the attribute data block.
- (I assume that the name record will consist of a facility
- name, such as "vms", followed by a dollar sign, followed
- by a facility-unique word.)
- .p -8
- LZ_EOR Signals "end of record".
-
- This is followed by a VMS file attributes record (generated
- by a VMS system library routine).
- .p -8
- LZ_ETX Signals "end of segment".
- .p -8
- ST_STX Signals "start of text" (i.e., start of data file).
-
- This is followed by the user data file.
- .p -8
- LZ_ETX Signals "end of segment"
- .p -8
- LZ_ETX Two in a row signals "end of file".
- .s.lm -8
- Note that this format can easily be extended to include
- trailer records (with file counts and checksums) and/or
- multiple data files in one compressed file.
-
- Note also that the LZ_CLEAR code may appear in headers
- or data files to cause the decompression program to
- "readapt" to the characteristics of the input data.
- LZ_STX and LZ_SOH reset the compression algorithm.
- LZ_EOR does not.
-
- Authors
-
- The algorithm is from "A Technique for High Performance
- Data Compression." Terry A. Welch. IEEE Computer Vol 17,
- No. 6 (June 1984), pp 8-19.
-
- This revision is by Martin Minow.
-
- Unix Compress authors are as follows:
- .s.nf
- Spencer W. Thomas (decvax!harpo!utah-cs!utah-gr!thomas)
- Jim McKie (decvax!mcvax!jim)
- Steve Davies (decvax!vax135!petsd!peora!srd)
- Ken Turkowski (decvax!decwrl!turtlevax!ken)
- James A. Woods (decvax!ihnp4!ames!jaw)
- Joe Orost (decvax!vax135!petsd!joe)
- .s.f
-
- #endif
-
- /*
- * Compatible with compress.c, v3.0 84/11/27
- */
-
- /*)BUILD
- * $(PROGRAM) = lzcomp
- * $(INCLUDE) = lz.h
- * $(FILES) = { lzcmp1.c lzcmp2.c lzcmp3.c lzio.c lzvio.c }
- */
-
- #include "lz.h"
-
-
- #ifdef unix
- #include <sys/types.h>
- #include <sys/stat.h>
- #endif
-
- /*
- * These global parameters are written to the compressed file.
- * The decompressor needs them. The initialized values are defaults
- * and are modified by command line arguments.
- */
- short maxbits = BITS; /* settable max # bits/code */
- code_int maxmaxcode = 1 << BITS; /* Totally largest code */
- code_int hsize = HSIZE; /* Actual hash table size */
-
- /*
- * Flags (command line arguments) to control compression.
- */
- #if VMS_V4
- flag export = 0; /* Assume vms "private" mode */
- #else
- flag export = 1; /* Assume Unix compatible mode */
- #endif
- flag block_compress = TRUE; /* Assume block compression */
- flag binary = FALSE; /* Reading text file if FALSE */
- flag noheader = FALSE; /* No magic header if TRUE */
- flag verbose = VERBOSE_DEFAULT; /* Non-zero for status/debug */
- flag background = FALSE; /* TRUE (Unix) if detached */
- readonly flag is_compress = TRUE; /* for lzio.c (needed?) */
- long fsize; /* Input file size in bytes */
- char *infilename = NULL; /* For error printouts */
- char *outfilename = NULL; /* For openoutput and errors */
- int firstcode; /* First code after internals */
- count_int tot_incount = 0; /* Total number of input bytes */
- count_int tot_outcount = 0; /* Total number of output codes */
- extern count_int in_count;
- extern count_int out_count;
- static long start_time; /* Time we started (in msec) */
- extern long cputime(); /* Returns process time in msec */
- STREAM instream;
- STREAM outstream;
- char_type inbuffer[MAXIO];
- char_type outbuffer[MAXIO];
- static STREAM mem_stream;
- jmp_buf failure;
- #if VMS_V4
- #include types
- #include stat
- #include descrip
- #ifndef FDLSTUFF
- #define FDLSTUFF char
- #endif
- FDLSTUFF *fdl_input;
- FDLSTUFF *fdl_output;
- static struct dsc$descriptor fdl_descriptor;
- #endif
-
- main(argc, argv)
- int argc;
- char *argv[];
- /*
- * Compress mainline
- */
- {
- #ifndef decus
- /*
- * background is TRUE if running detached from the command terminal.
- */
- background = (signal(SIGINT, SIG_IGN) == SIG_IGN) ? TRUE : FALSE;
- if (!background)
- background = !isatty(fileno(stderr));
- if (!background) {
- if (verbose > 0)
- signal(SIGINT, abort);
- else {
- signal(SIGINT, interrupt);
- signal(SIGSEGV, address_error);
- }
- }
- #endif
- if (setjmp(failure) == 0) {
- setup(argc, argv); /* Command line parameters */
- openinput(); /* Open input, set instream */
- getfilesize(); /* Get input file size */
- gethashsize(); /* Get actual hash table size */
- initialize(); /* Set maxbits and the like */
- openoutput(); /* Open output file */
- if (verbose > 0)
- start_time = cputime();
- put_magic_header();
- init_compress(TRUE);
- compress(&instream);
- #if VMS_V4
- if (export == 0) {
- outputcode((code_int) LZ_ETX);
- outputcode((code_int) LZ_ETX);
- fdl_close(fdl_input);
- }
- else
- #endif
- if (block_compress) {
- outputcode((code_int) LZ_CLEAR);
- outputcode((code_int) LZ_CLEAR);
- }
- outputcode((code_int) -1); /* Flush output buffers */
- #if VMS_V4
- if (export == 0)
- fdl_close(fdl_output);
- else {
- fclose(stdout);
- }
- #else
- fclose(stdout);
- #endif
- if (verbose > 0) {
- start_time = cputime() - start_time;
- tot_incount += in_count;
- tot_outcount += out_count;
- fprintf(stderr, "%ld chars in, %ld bytes out, ",
- tot_incount, tot_outcount);
- if (tot_outcount > 0) {
- divout("compression ratio: ",
- (long) tot_incount, (long) tot_outcount, "");
- divout(" (",
- ((long) tot_incount - (long) tot_outcount) * 100,
- (long) tot_incount, "%)\n");
- }
- fprintf(stderr,
- "%ld.%02ld seconds (process time) for compression.\n",
- start_time / 1000L, (start_time % 1000L) / 10L);
- if (start_time > 0) {
- divout(" ", (long) tot_incount * 10L,
- (start_time + 50L) / 100L,
- " input bytes per second.\n");
- }
- }
- exit(IO_SUCCESS);
- }
- else {
- fprintf(stderr, "Error when compressing \"%s\" to \"%s\"\n",
- (infilename == NULL) ?
- "<input file unknown>" : infilename,
- (outfilename == NULL) ?
- "<output file unknown>" : outfilename);
- if (errno != 0)
- perror("lzcomp fatal error");
- exit(IO_ERROR);
- }
- }
-
- divout(leader, numer, denom, trailer)
- char *leader;
- long numer;
- long denom;
- char *trailer;
- /*
- * Print numer/denom without floating point on small machines.
- */
- {
- fprintf(stderr, "%s%ld.%02ld%s",
- leader, numer / denom, ((numer % denom) * 100L) / denom, trailer);
- }
-
- static
- initialize()
- /*
- * Mung some global values.
- */
- {
- if (maxbits < INIT_BITS) /* maxbits is set by the -M */
- maxbits = INIT_BITS; /* option. Make sure it's */
- if (maxbits > BITS) /* within a reasonable range */
- maxbits = BITS;
- maxmaxcode = 1 << maxbits; /* Truly biggest code */
- if (export == 0)
- firstcode = LZ_FIRST; /* VMS private */
- else if (block_compress)
- firstcode = LZ_CLEAR + 1; /* Default */
- else
- firstcode = 256; /* Backwards compatible */
- }
-
- put_magic_header()
- /*
- * Write the magic header bits.
- */
- {
- #ifndef COMPATIBLE
- if (export && !noheader) {
- PUT(HEAD1_MAGIC, &outstream);
- PUT(HEAD2_MAGIC, &outstream);
- PUT(maxbits | ((block_compress) ? BLOCK_MASK : 0),
- &outstream);
- }
- #if VMS_V4
- else if (export == 0) {
- char text[256];
- /*
- * VMS private mode (with attribute block)
- */
- PUT(HEAD1_MAGIC, &outstream);
- PUT(VMS_HEAD2_MAGIC, &outstream);
- PUT((char) (maxbits | BLOCK_MASK), &outstream);
- PUT(firstcode - 0x100, &outstream);
- init_compress();
- outputcode(LZ_SOH);
- #if DEBUG
- if (strlen(ATT_NAME) != ATT_SIZE) {
- fprintf("\"%s\", expected %d, got %d\n",
- ATT_NAME, ATT_SIZE, strlen(ATT_NAME));
- }
- #endif
- sprintf(text, "%s%d;", ATT_NAME, fdl_descriptor.dsc$w_length);
- mem_compress(text, strlen(text));
- outputcode(LZ_EOR);
- mem_compress(fdl_descriptor.dsc$a_pointer,
- fdl_descriptor.dsc$w_length);
- fdl_free(&fdl_descriptor);
- outputcode(LZ_ETX);
- outputcode(LZ_STX);
- }
- #endif
- #endif
- }
-
- mem_compress(datum, length)
- char_type *datum;
- int length;
- /*
- * Compress from memory
- */
- {
- mem_stream.bp = mem_stream.bstart = datum;
- mem_stream.bsize = length;
- mem_stream.bend = datum + length;
- mem_stream.func = lz_eof;
- compress(&mem_stream);
- }
-
- /*
- * This routine is used to tune the hash table size according to
- * the file size. If the filesize is unknown, fsize should be
- * set to zero.
- */
-
- typedef struct TUNETAB {
- long fsize;
- code_int hsize;
- } TUNETAB;
-
- static readonly TUNETAB tunetab[] = {
- #if HSIZE > 5003
- { 1 << 12, 5003 },
- #endif
- #if HSIZE > 9001
- { 1 << 13, 9001 },
- #endif
- #if HSIZE > 18013
- { 1 << 14, 18013 },
- #endif
- #if HSIZE > 35023
- { 1 << 15, 35023 },
- { 47000, 50021 },
- #endif
- { 0, 0 },
- };
-
- static
- gethashsize()
- /*
- * Tune the hash table parameters for small files.
- * We don't have a good way to find the file size on vms V3.
- * fsize is set to zero if we can't find it.
- */
- {
- register TUNETAB *tunep;
-
- hsize = HSIZE;
- if (fsize > 0) {
- for (tunep = tunetab; tunep->fsize != 0; tunep++) {
- if (fsize < tunep->fsize) {
- hsize = tunep->hsize;
- break;
- }
- }
- }
- }
-
- static
- getfilesize()
- /*
- * Set fsize to the input filesize (in bytes) if possible.
- * Magic for all operating systems.
- */
- {
- #ifdef rsx
- extern char f_efbk; /* F.EFBK -- highest block in file */
- #define fdb(p,offset) (stdin->io_fdb[((int) &p + offset)] & 0xFF)
- #define efbk(offset) fdb(f_efbk, offset)
- extern char f_rtyp; /* F.RTYP -- Record type */
- extern char f_ratt; /* F.RATT -- Record attributes */
- /*
- * Note: Block number is stored high-order word first.
- */
- fsize = efbk(2)
- + (efbk(3) << 8)
- + (efbk(0) << 16)
- + (efbk(1) << 24);
- fsize *= 512;
- #endif
- #ifdef rt11
- fsize = stdin->io_size; /* Set by Decus C */
- fsize *= 512;
- #endif
- #ifdef vms
- #if VMS_V4
- struct stat statbuf;
-
- fsize = 0;
- if (export != 0) {
- if (fstat(fileno(stdin), &statbuf) == 0)
- fsize = (long) statbuf.st_size;
- }
- else {
- fsize = (long) fdl_fsize(fdl_input);
- }
- #else
- fsize = 0; /* Can't find filesize */
- #endif
- #endif
- #ifdef unix
- struct stat statbuf;
-
- fsize = 0;
- if (fstat(fileno(stdin), &statbuf) == 0)
- fsize = (long) statbuf.st_size;
- #endif
- }
-
- static readonly char *helptext[] = {
- "The following options are valid:",
- "-B\tBinary file (important on VMS/RSX, ignored on Unix)",
- "-M val\tExplicitly set the maximum number of code bits",
- "-V val\tPrint status information (or debugging) to stderr",
- "-X val\tSet export (compatiblity) mode:",
- #if VMS_V4
- " -X 0\tExplicitly choose VMS Private mode",
- #endif
- " -X 1\t(default if -X specified, output format is compatible",
- "\twith Unix compress V3.0",
- " -X 2\tCompatible with Unix compress 3.0, block compression",
- "\tsupressed.",
- #ifdef COMPATIBLE
- " -X 3No header (file is readable by old compress)",
- #endif
- NULL,
- };
-
- static
- setup(argc, argv)
- int argc;
- char *argv[];
- /*
- * Get parameters and open files. Exit fatally on errors.
- */
- {
- register char *ap;
- register int c;
- char **hp;
- auto int i;
- int j;
-
- #ifdef vms
- argc = getredirection(argc, argv);
- #endif
- for (i = j = 1; i < argc; i++) {
- ap = argv[i];
- if (*ap++ != '-' || *ap == EOS) /* Filename? */
- argv[j++] = argv[i]; /* Just copy it */
- else {
- while ((c = *ap++) != EOS) {
- if (islower(c))
- c = toupper(c);
- switch (c) {
- case 'B':
- binary = TRUE;
- break;
-
- case 'M':
- maxbits = getvalue(ap, &i, argv);
- if (maxbits < MIN_BITS) {
- fprintf(stderr, "Illegal -M value\n");
- goto usage;
- }
- break;
-
- case 'V':
- verbose = getvalue(ap, &i, argv);
- break;
-
- case 'X':
- export = getvalue(ap, &i, argv);
- if (export < 0 || export > 3) {
- fprintf(stderr, "Illegal -X value: %d\n", export);
- goto usage;
- }
- block_compress = "\1\1\0\0"[export];
- noheader = "\0\0\0\1"[export];
- export = "\0\1\1\1"[export];
- break;
-
- default:
- fprintf(stderr, "Unknown option '%c' in \"%s\"\n",
- *ap, argv[i]);
- usage: for (hp = helptext; *hp != NULL; hp++)
- fprintf(stderr, "%s\n", *hp);
- FAIL("usage");
- } /* Switch on options */
- } /* Everything for -xxx */
- } /* If -option */
- } /* For all argc's */
- /* infilename = NULL; */ /* Set "stdin" signal */
- /* outfilename = NULL; */ /* Set "stdout" signal */
- switch (j) { /* Any file arguments? */
- case 3: /* both files given */
- if (!streq(argv[2], "-")) /* But - means stdout */
- outfilename = argv[2];
- case 2: /* Input file given */
- if (!streq(argv[1], "-")) {
- infilename = argv[1];
- }
- break;
-
- case 0: /* None! */
- case 1: /* No file arguments */
- break;
-
- default:
- fprintf(stderr, "Too many file arguments\n");
- FAIL("too many files");
- }
- }
-
- static int
- getvalue(ap, ip, argv)
- register char *ap;
- int *ip;
- char *argv[];
- /*
- * Compile a "value". We are currently scanning *ap, part of argv[*ip].
- * The following are possible:
- * -x123 return (123) and set *ap to EOS so the caller
- * ap^ cycles to the next argument.
- *
- * -x 123 *ap == EOS and argv[*ip + 1][0] is a digit.
- * return (123) and increment *i to skip over the
- * next argument.
- *
- * -xy or -x y return(1), don't touch *ap or *ip.
- *
- * Note that the default for "flag option without value" is 1. This
- * can only cause a problem for the -M option where the value is
- * mandatory. However, the result of 1 is illegal as it is less
- * than INIT_BITS.
- */
- {
- register int result;
- register int i;
-
- i = *ip + 1;
- if (isdigit(*ap)) {
- result = atoi(ap);
- *ap = EOS;
- }
- else if (*ap == EOS
- && argv[i] != NULL
- && isdigit(argv[i][0])) {
- result = atoi(argv[i]);
- *ip = i;
- }
- else {
- result = 1;
- }
- return (result);
- }
-
- openinput()
- {
- #ifdef decus
- if (infilename == NULL) {
- infilename = malloc(256 + 1);
- fgetname(stdin, infilename);
- infilename = realloc(infilename, strlen(infilename) + 1);
- }
- else {
- if (freopen(infilename, (binary) ? "rn" : "r", stdin) == NULL) {
- perror(infilename);
- FAIL("can't reopen input");
- }
- }
- #else
- #ifdef vms
- #if VMS_V4
- if (export == 0) {
- char *fname;
- char filename[256];
-
- if ((fname = infilename) == NULL) {
- fgetname(stdin, filename);
- fname = filename;
- }
- if ((fdl_input = fdl_open(fname, &fdl_descriptor)) == NULL) {
- if ((fdl_status & 01) == 0) {
- fdl_message(NULL, fname);
- FAIL("can't fdl_open");
- }
- fprintf(stderr,
- "Cannot open \"%s\" in vms private format,", fname);
- fprintf(stderr, " trying export format.\n");
- export = TRUE;
- goto try_export;
- }
- fclose(stdin);
- stdin = NULL;
- infilename = malloc(256 + 1);
- infilename = realloc(fname, strlen(fname) + 1);
- if (verbose > 1) {
- fprintf(stderr, "FDL information for \"%s\"\n", filename);
- fdl_dump(&fdl_descriptor, stderr);
- }
- goto opened;
- }
- try_export:
- #endif
- if (infilename == NULL) {
- infilename = malloc(256 + 1);
- fgetname(stdin, infilename);
- infilename = realloc(infilename, strlen(infilename) + 1);
- }
- else {
- #if VMS_V4
- if ((stdin = freopen(infilename, "r", stdin)) == NULL) {
- #else
- if (freopen(infilename, "r", stdin) == NULL) {
- #endif
- perror(infilename);
- exit(IO_ERROR);
- }
- }
- #else
- if (infilename == NULL)
- infilename = "stdin";
- else {
- if (freopen(infilename, "r", stdin) == NULL) {
- perror(infilename);
- exit(IO_ERROR);
- }
- }
- #endif
- #endif
- opened: instream.bp = instream.bend = NULL;
- instream.bstart = inbuffer;
- instream.bsize = sizeof inbuffer;
- instream.func = lz_fill;
- }
-
- openoutput()
- /*
- * Open the output file (after the input file has been opened).
- * if outfilename == NULL, it's already open on stdout.
- */
- {
- if (outfilename == NULL) {
- #if VMS_V4
- #if 0 /* The following doesn't work */
- outfilename = malloc(256 + 1);
- fgetname(stdout, outfilename);
- outfilename = realloc(outfilename, strlen(outfilename) + 1);
- if (export == 0) {
- fclose(stdout);
- stdout = NULL; /* Can't do terminal test below */
- if ((fdl_output = fdl_create(NULL, outfilename)) == NULL) {
- if ((fdl_status & 01) == 0)
- fdl_message(NULL, outfilename);
- fprintf(stderr, "Can't create \"%s\"\n", outfilename);
- FAIL("can't fdl_create");
- }
- }
- #else
- fprintf(stderr,
- "Restriction: The output file must be specified.\n");
- FAIL("can't redirect on VMS V4");
- #endif
- #else
- #ifdef vms
- outfilename = malloc(256 + 1);
- fgetname(stdout, outfilename);
- outfilename = realloc(outfilename, strlen(outfilename) + 1);
- #else
- #ifdef decus
- outfilename = malloc(256 + 1);
- fgetname(stdout, outfilename);
- outfilename = realloc(outfilename, strlen(outfilename) + 1);
- #else
- outfilename = "<stdout>";
- #endif
- #endif
- #endif
- }
- else {
- #if VMS_V4
- if (export == 0) {
- fclose(stdout);
- stdout = NULL; /* Can't do terminal test below */
- if ((fdl_output = fdl_create(NULL, outfilename)) == NULL) {
- if ((fdl_status & 01) == 0)
- fdl_message(NULL, outfilename);
- fprintf(stderr,
- "Can't create \"%s\" (VMS private)\n", outfilename);
- FAIL("can't fdl_create");
- }
- }
- else {
- if (freopen(outfilename, "w", stdout) == NULL) {
- perror(outfilename);
- FAIL("can't create");
- }
- }
- #else
- #ifdef decus
- if (freopen(outfilename, "wn", stdout) == NULL) {
- perror(outfilename);
- FAIL("can't create");
- }
- #else
- if (freopen(outfilename, "w", stdout) == NULL) {
- perror(outfilename);
- FAIL("can't create");
- }
- #endif
- #endif
- }
- if (stdout != NULL && isatty(fileno(stdout))) {
- fprintf(stderr, "%s: is a terminal. We object.\n",
- outfilename);
- FAIL("can't create");
- }
- outstream.bp = outstream.bstart = outbuffer;
- outstream.bend = outbuffer + sizeof outbuffer;
- outstream.bsize = sizeof outbuffer;
- outstream.func = lz_flush;
- }
-
- -h- lzcmp2.c Wed Jul 24 11:49:28 1985 USER$A:[MINOW.LZ]LZCMP2.C;8
- /*
- * l z c m p 2 . c
- *
- * Actually do compression. Terminology (and algorithm):
- *
- * Assume the input string is "abcd", we have just processed "ab" and
- * read 'c'. At this point, a "prefix code" will be assigned to "ab".
- * Search in the prefix:character memory (either the "fast memory" or
- * the hash-code table) for the code followed by this character. If
- * found, assign the code found to the "prefix code" and read the
- * next character. If not found, output the current prefix code,
- * generate a new prefix code and store "old_prefix:char" in the
- * table with "new_prefix" as its definition.
- *
- * Naming conventions:
- * code a variable containing a prefix code
- * c or char a variable containing a character
- *
- * There are three tables that are searched (dependent on compile-time
- * and execution time considerations):
- * fast Direct table-lookup -- requires a huge amount of physical
- * (non-paged) memory, but is very fast.
- * hash Hash-coded table-lookup.
- * cache A "look-ahead" cache for the hash table that optimizes
- * searching for the most frequent character. This considerably
- * speeds up processing for raster-images (for example) at
- * a modest amount of memory.
- * Structures are used to hold the actual tables to simplify organization
- * of the program.
- *
- * Subroutines:
- * compress() performs data compression on an input datastream.
- * init_compress() called by the output routine to clear tables.
- */
-
- #include "lz.h"
-
- /*
- * General variables
- * Cleared by init_compress on a "hard initialization"
- * outputcode() in lzcmp3.c refers to next_code.
- */
-
- long int in_count; /* Length of input */
- long int out_count; /* Bytes written to output file */
- static flag first_clear = TRUE; /* Don't zero first time */
- code_int next_code; /* Next output code */
- static count_int checkpoint = CHECK_GAP; /* When to test ratio again */
- static long ratio = 0; /* Ratio for last segment */
-
- /*
- * These global parameters are set by mainline code. Unchanged here.
- */
- extern short maxbits; /* Settable max # bits/code */
- extern short block_compress; /* For old-style compatibility */
- extern code_int maxmaxcode; /* Actual maximum output code */
- extern long tot_incount; /* Total input count */
- extern long tot_outcount; /* Total output count */
- extern code_int hsize; /* Actual hash table size */
-
- #ifdef XENIX_16
- static count_int htab0[8192];
- static count_int htab1[8192];
- static count_int htab2[8192];
- static count_int htab3[8192];
- static count_int htab4[8192];
- static count_int htab5[8192];
- static count_int htab6[8192];
- static count_int htab7[8192];
- static count_int htab8[HSIZE - 65536];
- static count_int *hashtab[9] = {
- htab0, htab1, htab2, htab3, htab4, htab5, htab6, htab7, htab8
- };
-
- static U_short code0[16384];
- static U_short code1[16384];
- static U_short code2[16384];
- static U_short code3[16384];
- static U_short code4[HSIZE - 65536];
- static U_short *codetab[5] = {
- code0, code1, code3, code3, code4
- }
-
- #define HASH(i) (hashtab[((unsigned) (i)) >> 13][(i) & 0x1FFF])
- #define CODE(i) (codetab[((unsigned) (i)) >> 14][(i) & 0x3FFF])
-
- #else
- count_int hashtab[HSIZE];
- U_short codetab[HSIZE];
-
- #define HASH(i) hashtab[i]
- #define CODE(i) codetab[i]
- #endif
-
- /*
- * compress a datastream
- *
- * Algorithm: on large machines, for maxbits <= FBITS, use fast direct table
- * lookup on the prefix code / next character combination. For smaller code
- * size, use open addressing modular division double hashing (no chaining), ala
- * Knuth vol. 3, sec. 6.4 Algorithm D, along with G. Knott's relatively-prime
- * secondary probe. Do block compression with an adaptive reset, whereby the
- * code table is cleared when the compression ratio decreases, but after the
- * table fills. The variable-length output codes are re-sized at this point,
- * and a special LZ_CLEAR code is generated for the decompressor. For the
- * megamemory version, the sparse array is cleared indirectly through a
- * "shadow" output code history. Late additions: for the hashing code,
- * construct the table according to file size for noticeable speed improvement
- * on small files. Also detect and cache codes associated with the most
- * common character to bypass hash calculation on these codes (a characteristic
- * of highly-compressable raster images). Please direct questions about this
- * implementation to ames!jaw.
- */
-
- compress(in)
- STREAM *in; /* Input stream structure */
- /*
- * Compress driver. Global fsize is the size of the entire datastream
- * (from LZ_STX or LZ_SOH to the terminating LZ_ETX). You must
- * force a reinitialization -- by calling outputcode() with a new header --
- * if size is changed. If the "newer" output format is chosen (with
- * data streams delimited by LZ_SOH/LZ_STX, init_compress will be
- * called automatically. Otherwise, you must call init_compress(TRUE)
- * before calling compress() for the first time.
- */
- {
- register long hash_code; /* What we look for */
- register code_int i; /* Index into vectors */
- register int c; /* Current input char */
- register code_int code; /* Substring code */
- register int displacement; /* For secondary hash */
- register code_int hsize_reg; /* Size of hash table */
- register int hshift; /* For xor hasher */
-
- if ((code = GET(in)) == EOF)
- return;
- in_count++;
- hsize_reg = hsize;
- /*
- * Set hash code range bound
- */
- hshift = 0;
- for (hash_code = (long) hsize; hash_code < 65536L; hash_code <<= 1)
- hshift++;
- hshift = 8 - hshift;
- while ((c = GET(in)) != (unsigned) EOF) {
- in_count++;
- hash_code = (long) (((long) c << maxbits) + code);
- i = (c << hshift) ^ code; /* XOR hashing */
- if (HASH(i) == hash_code) { /* Found at first slot? */
- code = CODE(i);
- continue;
- }
- else if ((long) HASH(i) < 0) /* empty slot */
- goto nomatch;
- displacement = hsize_reg - i; /* secondary hash */
- if (i == 0)
- displacement = 1;
- probe:
- if ((i -= displacement) < 0) /* Wrap around? */
- i += hsize_reg;
- if (HASH(i) == hash_code) { /* Found in hash table? */
- code = CODE(i); /* Set new prefix code */
- continue; /* Read next input char */
- }
- else if ((long) HASH(i) > 0) /* If slot is occupied */
- goto probe; /* Look somewhere else */
- nomatch:
- /*
- * Output the current prefix and designate a new prefix.
- * If the input character was the "hog", save it in the
- * look-ahead cache table. Then, save in the hash table.
- */
- outputcode((code_int) code); /* No match, put prefix */
- #if SIGNED_COMPARE_SLOW
- if ((unsigned) next_code < (unsigned) maxmaxcode) {
- #else
- if (next_code < maxmaxcode) {
- #endif
- CODE(i) = next_code++; /* code -> hashtable */
- HASH(i) = hash_code;
- }
- else if (block_compress
- && (count_int) in_count >= checkpoint) {
- clear();
- }
- code = c; /* Start new substring */
- }
- /*
- * At EOF, put out the final code.
- */
- outputcode((code_int) code);
- }
-
- clear()
- /*
- * Check the compression ratio to see whether it is going up
- * or staying the same. If it is going down, the internal
- * statistics of the file have changed, so clear out our
- * tables and start over. Inform the decompressor of the
- * change by sending out a LZ_CLEAR code.
- */
- {
- register long int rat;
-
- checkpoint = in_count + CHECK_GAP;
- #if DEBUG
- if (verbose > 2) {
- divout("at clear() test", in_count, out_count, "");
- fprintf(stderr, ", ratio at entry: %ld.%02ld, gap %d\n",
- rat / 256L, ((rat & 255L) * 100L) / 256L, CHECK_GAP);
- }
- #endif
- if (in_count > 0x007FFFFL) { /* Shift will overflow */
- rat = out_count >> 8;
- if (rat == 0)
- rat = 0x7FFFFFFFL;
- else {
- rat = in_count / rat;
- }
- }
- else {
- rat = (in_count << 8) / out_count;
- }
- if (rat > ratio)
- ratio = rat;
- else {
- #if DEBUG
- if (verbose > 0) {
- fprintf(stderr, "Resetting compression, in %ld, out %ld\n",
- in_count, out_count);
- fprintf(stderr, "Old ratio: %ld == (%ld.%02ld)",
- ratio, ratio / 256L, ((ratio & 255L) * 100L) / 256L);
- fprintf(stderr, ", test ratio: %ld = (%ld.%02ld), gap %d\n",
- rat, rat / 256L, ((rat & 255L) * 100L) / 256L, CHECK_GAP);
- }
- #endif
- outputcode((code_int) LZ_CLEAR); /* Calls init_compress */
- }
- }
-
- init_compress(full_init)
- flag full_init; /* TRUE for full initialization */
- /*
- * Clear the tables. Called by outputcode() on LZ_SOH, LZ_STX
- * (full_init TRUE) or on LZ_CLEAR (full_init FALSE).
- * init_compress() is not called on LZ_EOR.
- */
- {
- #ifdef XENIX_16
- register count_int *hp;
- register int n;
- register int j;
- register code_int k;
-
- k = hsize;
- for (j = 0; k > 0; k -= 8192) {
- i = (k < 8192) ? k : 8192;
- hp = hashtab[j++];
- n = i >> 4;
- switch (i & 15) {
- case 15: *hp++ = -1;
- case 14: *hp++ = -1;
- case 13: *hp++ = -1;
- case 12: *hp++ = -1;
- case 11: *hp++ = -1;
- case 10: *hp++ = -1;
- case 9: *hp++ = -1;
- case 8: *hp++ = -1;
- case 7: *hp++ = -1;
- case 6: *hp++ = -1;
- case 5: *hp++ = -1;
- case 4: *hp++ = -1;
- case 3: *hp++ = -1;
- case 2: *hp++ = -1;
- case 1: *hp++ = -1;
- }
- while (--n >= 0) {
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- }
- }
- #else
- register count_int *hp;
- register code_int n;
-
- hp = &hashtab[0];
- n = hsize >> 4; /* divide by 16 */
- switch (hsize & 15) {
- case 15: *hp++ = -1;
- case 14: *hp++ = -1;
- case 13: *hp++ = -1;
- case 12: *hp++ = -1;
- case 11: *hp++ = -1;
- case 10: *hp++ = -1;
- case 9: *hp++ = -1;
- case 8: *hp++ = -1;
- case 7: *hp++ = -1;
- case 6: *hp++ = -1;
- case 5: *hp++ = -1;
- case 4: *hp++ = -1;
- case 3: *hp++ = -1;
- case 2: *hp++ = -1;
- case 1: *hp++ = -1;
- }
- while (--n >= 0) {
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
- }
- #endif
- if (full_init) {
- tot_incount += in_count;
- tot_outcount += out_count;
- in_count = 0;
- out_count = 0;
- ratio = 0;
- }
- first_clear = FALSE;
- next_code = firstcode;
- }
- -h- lzcmp3.c Wed Jul 24 11:49:28 1985 USER$A:[MINOW.LZ]LZCMP3.C;10
- /*
- * l z c m p 3 . c
- * Output a given code.
- */
- #include "lz.h"
-
- extern STREAM outstream;
- extern code_int next_code;
- extern code_int maxmaxcode; /* Actual maximum output code */
- extern short maxbits;
- extern count_int out_count;
-
- static char_type buf[BITS];
- static int offset;
- static short n_bits = INIT_BITS; /* # of bits in compressed file */
- static short n_bits8 = INIT_BITS << 3;
- static code_int maxcode = MAXCODE(INIT_BITS);
-
- #if !vax_asm
- static readonly char_type lmask[9] = {
- 0xFF, 0xFE, 0xFC, 0xF8, 0xF0, 0xE0, 0xC0, 0x80, 0x00
- };
- static readonly char_type rmask[9] = {
- 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F, 0xFF
- };
- #endif
-
- #if DEBUG
- extern int col;
- static int todump;
- #endif
-
- outputcode(code)
- code_int code;
- /*
- * Output the given code.
- * Inputs:
- * code: A n_bits-bit integer. If == -1, then EOF. This assumes
- * that n_bits <= (long)wordsize - 1.
- * Note: if not in "export" mode, the following values are special:
- * LZ_CLEAR (also in export mode if block_compress TRUE)
- * (soft) clear out compress tables and reset the
- * number of bits per code to the minimum.
- * LZ_SOH, LZ_STX (hard) clear out compress tables and reset as above.
- * LZ_ETX, LZ_EOR force out the current output segment, analogous
- * to fflush.
- *
- * Outputs:
- * Outputs code to the file. If the codespace has filled
- * (next_code >= (1 << n_bits), increase n_bits.
- * If LZ_CLEAR, LZ_SOH, or LZ_STX is seen, reset n_bits
- * to the initial value and call init_compress to reset
- * the lookup and cache tables.
- *
- * Assumptions:
- * Output chars are 8 bits long. This is deeply hardwired
- * into the algorithm. It is independent, however, of the
- * size of the input data.
- *
- * Algorithm:
- * Maintain a BITS character long buffer (so that 8 codes will
- * fit in it exactly). Use the VAX insv instruction to insert each
- * code in turn. When the buffer fills up empty it and start over.
- */
- {
- /*
- * On the VAX (Unix), it is important to have the register declarations
- * in exactly the order given, or the asm will break.
- */
- register int r_off, bits;
- register char_type *bp;
- #if !vax_asm
- register code_int r_code;
- #endif
-
- r_off = offset;
- bits = n_bits;
- bp = buf;
- if (code >= 0) {
- /*
- * Not at EOF, add the code
- */
- #if DEBUG
- if (verbose > 3) {
- fprintf(stderr, "%c%5d %5d",
- ((col += 12) >= 72) ? (col = 0, '\n') : ' ',
- code, next_code);
- if (code >= LZ_CLEAR && code < firstcode) {
- fprintf(stderr, " = %s", lz_names[code - LZ_CLEAR]);
- col = 74;
- }
- }
- #endif
- #if vax_asm
- /*
- * VAX DEPENDENT!! Implementation on other machines may be
- * difficult.
- *
- * Translation: Insert BITS bits from the argument starting at
- * offset bits from the beginning of buf.
- */
- 0; /* C compiler bug ?? */
- asm("insv 4(ap),r11,r10,(r9)");
- #else
- /*
- * WARNING: byte/bit numbering on the vax is simulated
- * by the following code
- */
- bp += (r_off >> 3); /* -> first output slot */
- r_off &= 7;
- /*
- * Since code is always >= 8 bits, only need to mask the first
- * hunk on the left.
- */
- r_code = code;
- *bp = (*bp & rmask[r_off]) | (r_code << r_off) & lmask[r_off];
- bp++;
- bits -= (8 - r_off);
- r_code >>= 8 - r_off;
- /*
- * Get any 8 bit parts in the middle ( <= 1 for up to 16 bits).
- */
- if (bits >= 8) {
- *bp++ = r_code;
- r_code >>= 8;
- bits -= 8;
- }
- if (bits != 0) /* Last bits. */
- *bp = r_code;
- #endif
- offset += n_bits;
- if (offset == n_bits8) {
- out_count += n_bits;
- lz_putbuf(buf, n_bits, &outstream);
- #if DEBUG
- if (todump > 0) {
- dumphex(buf, n_bits, stderr);
- todump -= n_bits;
- }
- #endif
- offset = 0;
- }
- /*
- * If the next entry is going to be too big for the code size,
- * then increase it, if possible. Note:
- * !export firstcode == LZ_FIRST
- * export && block_compress firstcode == LZ_CLEAR + 1
- * export && !block_compress firstcode == LZ_CLEAR
- */
- if (next_code > maxcode) {
- if (offset > 0) {
- lz_putbuf(buf, n_bits, &outstream);
- out_count += n_bits;
- offset = 0;
- #if DEBUG
- if (todump > 0) {
- dumphex(buf, n_bits, stderr);
- todump -= n_bits;
- }
- #endif
- }
- n_bits++; /* Need more bits */
- n_bits8 += (1 << 3);
- if (n_bits == maxbits)
- maxcode = maxmaxcode;
- else
- maxcode = MAXCODE(n_bits);
- #if DEBUG
- if (verbose > 2) {
- fprintf(stderr,
- "%snext_code %d, change to %d bits max %d",
- (col > 0) ? "\n" : "", next_code,
- n_bits, maxcode);
- col = 74;
- }
- #endif
- }
- if (code >= LZ_CLEAR && code < firstcode) {
- switch (code) {
- case LZ_SOH:
- case LZ_STX:
- case LZ_CLEAR:
- if (offset > 0) {
- lz_putbuf(buf, n_bits, &outstream);
- out_count += n_bits;
- offset = 0;
- #if DEBUG
- if (todump > 0) {
- dumphex(buf, n_bits, stderr);
- todump -= n_bits;
- }
- #endif
- }
- n_bits = INIT_BITS; /* Reset codes */
- n_bits8 = INIT_BITS << 3;
- maxcode = MAXCODE(INIT_BITS);
- init_compress(code != LZ_CLEAR);
- #if DEBUG
- if (verbose > 2) {
- fprintf(stderr,
- "\n(%s) Change to %d bits, maxcode %d, next_code = %d",
- lz_names[code - LZ_CLEAR],
- n_bits, maxcode, next_code);
- col = 74;
- todump = 32;
- }
- #endif
- break;
-
- case LZ_EOR:
- case LZ_ETX: /* Just written out */
- break;
-
- default:
- abort(); /* Can't happen */
- }
- }
- }
- else {
- /*
- * At EOF, write the rest of the buffer.
- */
- if ((r_off = offset) > 0) {
- r_off += 7;
- r_off >>= 3;
- lz_putbuf(buf, r_off, &outstream);
- out_count += r_off;
- #if DEBUG
- if (todump > 0) {
- dumphex(buf, r_off, stderr);
- todump -= r_off;
- }
- #endif
- }
- offset = 0;
- lz_flush(&outstream); /* Flush output buffer */
- #if DEBUG
- if (verbose > 3 || todump > 0) {
- fprintf(stderr, "\n*EOF*\n");
- col = 0;
- }
- #endif
- }
- }
-